SkyDist: Data Mining on Skyline Objects
نویسندگان
چکیده
The skyline operator is a well established database primitive which is traditionally applied in a way that only a single skyline is computed. In this paper we use multiple skylines themselves as objects for data exploration and data mining. We define a novel similarity measure for comparing different skylines, called SkyDist. SkyDist can be used for complex analysis tasks such as clustering, classification, outlier detection, etc. We propose two different algorithms for computing SkyDist, based on Monte-Carlo sampling and on the plane sweep paradigm. In an extensive experimental evaluation, we demonstrate the efficiency and usefulness of SkyDist for a number of applications and data mining methods.
منابع مشابه
K-Dominant Skyline Computation by Using Sort-Filtering Method
Skyline queries are useful in many applications such as multicriteria decision making, data mining, and user preference queries. A skyline query returns a set of interesting data objects that are not dominated in all dimensions by any other objects. For a high-dimensional database, sometimes it returns too many data objects to analyze intensively. To reduce the number of returned objects and to...
متن کاملMining Thick Skylines over Large Databases
People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3, 15, 8, 13, 2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, whi...
متن کاملRanking uncertain sky: The probabilistic top-k skyline operator
Many recent applications involve processing and analyzing uncertain data. In this paper, we combine the feature of top-k objects with that of skyline to model the problem of top-k skyline objects against uncertain data. The problem of efficiently computing top-k skyline objects on large uncertain datasets is challenging in both computing the top-k skyline objects is developed for discrete cases...
متن کاملOn Dominating Your Neighborhood Profitably
Recent research on skyline queries has attracted much interest in the database and data mining community. Given a database, an object belongs to the skyline if it cannot be dominated with respect to the given attributes by any other database object. Current methods have only considered so-called min/max attributes like price and quality which a user wants to minimize or maximize. However, objec...
متن کاملProbabilistic Skyline Queries over Uncertain Moving Objects
Data uncertainty inherently exists in a large number of applications due to factors such as limitations of measuring equipments, update delay, and network bandwidth. Recently, modeling and querying uncertain data have attracted considerable attention from the database community. However, how to perform advanced analysis on uncertain data remains an interesting question. In this paper, we focus ...
متن کامل